Goto

Collaborating Authors

 random feature map




Learning with Group Invariant Features: A Kernel Perspective.

Neural Information Processing Systems

We analyze in this paper a random feature map based on a theory of invariance (\emph{I-theory}) introduced in \cite{AnselmiLRMTP13}. More specifically, a group invariant signal signature is obtained through cumulative distributions of group-transformed random projections. Our analysis bridges invariant feature learning with kernel methods, as we show that this feature map defines an expected Haar-integration kernel that is invariant to the specified group action. We show how this non-linear random feature map approximates this group invariant kernel uniformly on a set of $N$ points. Moreover, we show that it defines a function space that is dense in the equivalent Invariant Reproducing Kernel Hilbert Space.


A Technique for Isolating Lexically-Independent Phonetic Dependencies in Generative CNNs

Šegedin, Bruno Ferenc

arXiv.org Artificial Intelligence

--The ability of deep neural networks (DNNs) to represent phonotactic generalizations derived from lexical learning remains an open question. This study (1) investigates the lexically-invariant generalization capacity of generative convolutional neural networks (CNNs) trained on raw audio waveforms of lexical items and (2) explores the consequences of shrinking the fully-connected layer (FC) bottleneck from 1024 channels to 8 before training. Ultimately, a novel technique for probing a model's lexically-independent generalizations is proposed that works only under the narrow FC bottleneck: generating audio outputs by bypassing the FC and inputting randomized feature maps into the convolutional block. These outputs are equally biased by a phonotactic restriction in training as are outputs generated with the FC. This result shows that the convolutional layers can dynamically generalize phonetic dependencies beyond lexically-constrained configurations learned by the FC.


Tensor Sketch: Fast and Scalable Polynomial Kernel Approximation

Pham, Ninh, Pagh, Rasmus

arXiv.org Artificial Intelligence

Approximation of non-linear kernels using random feature maps has become a powerful technique for scaling kernel methods to large datasets. We propose $\textit{Tensor Sketch}$, an efficient random feature map for approximating polynomial kernels. Given $n$ training samples in $\mathbb{R}^d$ Tensor Sketch computes low-dimensional embeddings in $\mathbb{R}^D$ in time $\mathcal{O}\left( n(d+D \log{D}) \right)$ making it well-suited for high-dimensional and large-scale settings. We provide theoretical guarantees on the approximation error, ensuring the fidelity of the resulting kernel function estimates. We also discuss extensions and highlight applications where Tensor Sketch serves as a central computational tool.


Learning dynamical systems with hit-and-run random feature maps

Mandal, Pinak, Gottwald, Georg A.

arXiv.org Machine Learning

We show how random feature maps can be used to forecast dynamical systems with excellent forecasting skill. We consider the tanh activation function and judiciously choose the internal weights in a data-driven manner such that the resulting features explore the nonlinear, non-saturated regions of the activation function. We introduce skip connections and construct a deep variant of random feature maps by combining several units. To mitigate the curse of dimensionality, we introduce localization where we learn local maps, employing conditional independence. Our modified random feature maps provide excellent forecasting skill for both single trajectory forecasts as well as long-time estimates of statistical properties, for a range of chaotic dynamical systems with dimensions up to 512. In contrast to other methods such as reservoir computers which require extensive hyperparameter tuning, we effectively need to tune only a single hyperparameter, and are able to achieve state-of-the-art forecast skill with much smaller networks.


The Unreasonable Effectiveness of Structured Random Orthogonal Embeddings

Krzysztof M. Choromanski, Mark Rowland, Adrian Weller

Neural Information Processing Systems

We examine a class of embeddings based on structured random matrices with orthogonal rows which can be applied in many machine learning applications including dimensionality reduction and kernel approximation. For both the Johnson-Lindenstrauss transform and the angular kernel, we show that we can select matrices yielding guaranteed improved performance in accuracy and/or speed compared to earlier methods. We introduce matrices with complex entries which give significant further accuracy improvement. We provide geometric and Markov chain-based perspectives to help understand the benefits, and empirical results which suggest that the approach is helpful in a wider range of applications.


On the choice of the non-trainable internal weights in random feature maps

Mandal, Pinak, Gottwald, Georg A.

arXiv.org Machine Learning

The computationally cheap machine learning architecture of random feature maps can be viewed as a single-layer feedforward network in which the weights of the hidden layer are random but fixed and only the outer weights are learned via linear regression. The internal weights are typically chosen from a prescribed distribution. The choice of the internal weights significantly impacts the accuracy of random feature maps. We address here the task of how to best select the internal weights. In particular, we consider the forecasting problem whereby random feature maps are used to learn a one-step propagator map for a dynamical system. We provide a computationally cheap hit-and-run algorithm to select good internal weights which lead to good forecasting skill. We show that the number of good features is the main factor controlling the forecasting skill of random feature maps and acts as an effective feature dimension. Lastly, we compare random feature maps with single-layer feedforward neural networks in which the internal weights are now learned using gradient descent. We find that random feature maps have superior forecasting capabilities whilst having several orders of magnitude lower computational cost.


Learning with Group Invariant Features: A Kernel Perspective

Neural Information Processing Systems

We analyze in this paper a random feature map based on a theory of invariance (I-theory) introduced in [1]. More specifically, a group invariant signal signature is obtained through cumulative distributions of group-transformed random projections. Our analysis bridges invariant feature learning with kernel methods, as we show that this feature map defines an expected Haar-integration kernel that is invariant to the specified group action. We show how this non-linear random feature map approximates this group invariant kernel uniformly on a set of N points. Moreover, we show that it defines a function space that is dense in the equivalent Invariant Reproducing Kernel Hilbert Space. Finally, we quantify error rates of the convergence of the empirical risk minimization, as well as the reduction in the sample complexity of a learning algorithm using such an invariant representation for signal classification, in a classical supervised learning setting.


Scalable Graph Transformers for Million Nodes

#artificialintelligence

Recently, building Transformer models for handling graph-structured data has aroused wide interests in the machine learning research community. One critical challenge stems from the quadratic complexity of global attention that hinders Transformers for scaling to large graphs. This work proposes a scalable graph Transformers for large node classification graphs where the node numbers could vary from thousands to millions (or even more). The key module is a kernelized Gumbel-Softmax-based message passing that achieves all-pair feature propagation within O(N) complexity (N for #nodes). The following content will summarize the main idea and results of this work.